Mamba-7B is a 7-billion-parameter model based on the Mamba architecture, trained over multiple rounds on the RefinedWeb dataset (1.2 trillion tokens). Mamba is a state space model that does not use self-attention mechanisms and excels in various natural language benchmarks.
Large Language Model English